irrelevant item
Activating Visual Context and Commonsense Reasoning through Masked Prediction in VLMs
Yu, Jiaao, Li, Shenwei, Han, Mingjie, Yin, Yifei, Song, Wenzheng, Jia, Chenghao, Lan, Man
Recent breakthroughs in reasoning models have markedly advanced the reasoning capabilities of large language models, particularly via training on tasks with verifiable rewards. Y et, a significant gap persists in their adaptation to real-world mul-timodal scenarios, most notably, vision-language tasks, due to a heavy focus on single-modal language settings. While efforts to transplant reinforcement learning techniques from NLP to Visual Language Models (VLMs) have emerged, these approaches often remain confined to perception-centric tasks or reduce images to textual summaries, failing to fully exploit visual context and commonsense knowledge, ultimately constraining the generalization of reasoning capabilities across diverse multimodal environments. To address this limitation, we introduce a novel fine-tuning task, Masked Prediction via Context and Commonsense (MPCC), which forces models to integrate visual context and commonsense reasoning by reconstructing semantically meaningful content from occluded images, thereby laying the foundation for generalized reasoning. To systematically evaluate the model's performance in generalized reasoning, we developed a specialized evaluation benchmark, MPCC-Eval, and employed various fine-tuning strategies to guide reasoning. Among these, we introduced an innovative training method, Reinforcement Fine-Tuning with Prior Sampling, which not only enhances model performance but also improves its generalized reasoning capabilities in out-of-distribution (OOD) and cross-task scenarios. Code and data are available at yjainqdc.
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (0.92)
- Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.90)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
Reviews: Detecting Overfitting via Adversarial Examples
The work addresses the issue of neural networks' overfitting to test sets on classification tasks due to widespread reuse of the same datasets throughout the community, and how that affects the credibility of reported test error rates, which should reflect performance on'truly new' data from the same distribution. The proposed test statistic does not affect the training procedure, and is simple in theory: if the (importance-reweighted) empirical risk and the empirical risk of adversarially-perturbed examples differs by more than a certain threshold (given by concentration bounds), the null hypothesis that the classifier and the test data are independent is rejected. My main concern is that the type of adversarial examples used, bounded translational shifts (for image data), is very limited and likely to be unrealistic. Effectively shifting the frame of a CIFAR image is quite different from swapping items in a scene; it is less subtle and less'insidious', unless perhaps a "7" is converted via truncation into a "1". It would have been nice to see example adversarial images for a sense of how they compare to the ones typically discussed in the literature, particularly as a selling point of the work is the use of adversarial examples.
Technical Perspective: Evaluating Sampled Metrics Is Challenging
Item recommendation algorithms rank the items in a catalogue from the most relevant to the least relevant ones for a given context (for example, query) provided in input. Such algorithms are a key component of our daily interactions with digital systems, and their diffusion in the society will only increase in the foreseeable future. Given the diffusion of recommendation systems, their comparison is a crucial endeavor. Item recommendation algorithms are usually compared using some metric (for example, average precision) that depends on the position of the truly relevant items in the ranking, produced by the algorithm, of all the items in a catalogue. The experimental evaluation and comparison of algorithms is far from easy.
Fifty Shades of Ratings: How to Benefit from a Negative Feedback in Top-N Recommendations Tasks
Frolov, Evgeny, Oseledets, Ivan
Conventional collaborative filtering techniques treat a top-n recommendations problem as a task of generating a list of the most relevant items. This formulation, however, disregards an opposite - avoiding recommendations with completely irrelevant items. Due to that bias, standard algorithms, as well as commonly used evaluation metrics, become insensitive to negative feedback. In order to resolve this problem we propose to treat user feedback as a categorical variable and model it with users and items in a ternary way. We employ a third-order tensor factorization technique and implement a higher order folding-in method to support online recommendations. The method is equally sensitive to entire spectrum of user ratings and is able to accurately predict relevant items even from a negative only feedback. Our method may partially eliminate the need for complicated rating elicitation process as it provides means for personalized recommendations from the very beginning of an interaction with a recommender system. We also propose a modification of standard metrics which helps to reveal unwanted biases and account for sensitivity to a negative feedback. Our model achieves state-of-the-art quality in standard recommendation tasks while significantly outperforming other methods in the cold-start "no-positive-feedback" scenarios.
- Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
- Asia > Russia (0.04)
- Asia > Middle East > Republic of Türkiye > Batman Province > Batman (0.04)
- (2 more...)
- Media > Film (1.00)
- Leisure & Entertainment (0.93)